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On Estimating the Perimeter Using the Alpha-Shape 

Ery Arias-Castro* and Alberto Rodriguez Casal^ 


Abstract 

We consider the problem of estimating the perimeter of a smooth domain in the plane 
based on a sample from the uniform distribution over the domain. We study the performance 
of the estimator defined as the perimeter of the alpha-shape of the sample. Some numerical 
experiments corroborate our theoretical findings. 

Keywords: perimeter estimation; a-shape; r-convex hull; rolling condition; sets with positive 
reach. 

1 Introduction 

The problem of recovering topological and geometric information about the support of a distribu¬ 
tion based on a sample has received a considerable amount of attention in a number of fields, such as 
computational geometry, computer vision, image analysis, clustering or pattern recognition. This 
includes, for example, estimating of the number of connected components (Biau et al., 2007), the in¬ 
trinsic dimensionality (Levina and Bickel, 2005) and, more generally, the homology (Carlsson, 2009; 
Chazal and Lieutier, 2005; Niyogi et ah, 2008; Robins, 1999; Zomorodian and Carlsson, 2005), the 
Minkowski content (Cuevas et ah, 2007a), as well as the perimeter and area (Braker and Hsing, 
1998; Renyi and Sulanke, 1964). The estimation of the support or, more generally, level sets 
of a density is itself a rich line of research (Cadre, 2006; Polonik, 1995; Rodriguez Casal, 2007; 
Singh et ah, 2009; Tsybakov, 1997; Walther, 1997). A closely related topic is that of set estimation 
(Cuevas and Fraiman, 2010; Mammen and Tsybakov, 1995). We refer the reader to the classic 
book of Korostelev and Tsybakov (1993), which treats a number of these topics. 

We focus here on the problem of estimating the perimeter of the support. Concretely, we are 
given a set of points Xn = {Xi ,..., Xn}, which we assume are independently sampled uniformly at 
random from an unknown compact set S' C and our goal is to estimate the perimeter of S, by 
which we mean the length of its boundary. Let dS denote the boundary of a set S C namely 
dS = S' n S'^, where S denotes the closure of S and S''^ = \ S is the complement of S'. 

1.1 Related work 

Renyi and Sulanke (1964) address this problem under the assumption that S is convex and estimate 
its perimeter by the perimeter of the convex hull of the sample Xn- They obtain the precise rate 
of convergence in expectation, which is of order 0(n“^/^) when the boundary dS has bounded 
curvature. They also obtain an analogous result for the problem of estimating the area of S'. 
Braker and Hsing (1998) extend their results to other sampling distributions. See (Reitzner, 2010) 
for a review on more recent results on the convex hull of a random sample. 
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There is a series of papers that consider the problem of estimating the surface area of the 
boundary of a more general class of supports S but under a different sampling scheme where 
two samples are given, one from the uniform distribution on S and another from the uniform 
distribution on G \ S, where G is a bounded set containing S. In that line, Cuevas et al. (2007b) 
aim at estimating the Minkowski content of dS, and introduce an estimator that is proved to be 
consistent under weak assumptions on the set S. They obtain a convergence rate of in 

dimension 2 when dS has bounded curvature—in which case the Minkowski content coincides with 
the perimeter. Pateiro-Lopez and Rodriguez-Casal (2008, 2009) follow their work and propose a 
different estimator, which is very closely related to the one we study here, obtaining an improved 
rate convergence of in dimension 2. Continuing this line of work, Jimenez and Yukich 

(2011) propose an estimator of the perimeter of S based on a Delaunay triangulation, which is 
shown to be consistent under mild assumptions on S. 

Also closely related is the work of Kim and Korostelev (2000) in the context of binary images, 
which includes the estimation of the length of the boundary of a horizon of the form {(x,y) G 
[0,1]^ : y < g{x)}, where g : [0,1] [0,1] is a function with Holder regularity. See Section 6 for 

further comments. 

1.2 The r-rolling condition 

A set S is said to fulfill the r-rolling condition if for any x G dS there is a open ball with radius r, 
B, such that H n 5 = 0 and x G dB. In this paper, we work under the assumption that S satisfies 
the following condition: 

S is a compact subset ofM? such that both S and S’^ satisfy the r-rolling condition. 

From a geometrical point of view, we are assuming that a ball of radius r can roll inside S 
and This rolling condition implies that, for any x G dS^ there are two open balls B~^ and B~ 
such that X G dB~^ n dB~, B~^ C S and B~ C S'^. In fact, it can be easily seen (Pateiro-Lopez, 
2008, Lemma A.0.1) that this is only possible if there is a (unique) unit vector (the unit normal 
vector at X pointing outward) such that = i?(x — ry^,, r) and = ^(x-l-rr/a;, r), where H(a, a) 
denotes the open ball with radius a and center a G See (Walther, 1999) for a comprehensive 
discussion, including a relation to Serra’s regular model and mathematical morphology. The r- 
rolling condition is closely linked to the notion of r-convexity. A set S is said to be r-convex if for 
any point x ^ S there is a open ball B of radius r such that x G H and H n 5 = 0 (Perkal, 1956; 
Walther, 1997). It is known that, if both S and satisfy the r-rolling condition, then S and 5"^ 
are r-convex; see (Pateiro-Lopez, 2008, Lemma A.0.8) and also (Walther, 1999). 

The r-rolling condition is also connected with the idea of sets of positive reach introduced in 
the seminal paper (Federer, 1959). For a nonempty set S C and x G M^, define 

dist(x, S) = inf{||x — s|| : s G S}, 

where || • || stands for the Euclidean norm. The reach of a set S, denoted p{S), is the sup remum over 
r > 0 such that there is a unique point realizing inf{||x — s|| : s G S} on the set {x ; dist(x, S) < r}. 
For twice differentiable submanifolds (e.g., curves), the reach bounds the radius of curvature from 
above (Federer, 1959, Lem. 4.17). Also, if S and S'^ satisfy the r-rolling condition then p{dS) > r; 
see (Pateiro-Lopez, 2008, Lemma A.0.6). Conversely, using results in (Cuevas et ah, 2012), it 
follows easily that the converse is true if, in addition, S is equal to the closure of its interior. 
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1.3 The estimator 


Our estimator for the perimeter of S is the perimeter of the a-shape of Xn, for some fixed 0 < a < r. 
The a-shape of is the polygon, denoted Cai^^n), whose edges—which we call a-edges—are de¬ 
fined as follows (Edelsbrunner et ah, 1983). A pair {Xi,Xj) forms an a-edge if there is an open 
ball B of radius a such that Xi,Xj G dB and B (1 Xn = If a is large enough, the a-shape 
coincides with the convex hull of the sample. For a smaller a, the a-shape is not necessarily con¬ 
vex. See Figure 1.3 for an illustration. The a-shape is well known in the computational geometry 
literature for producing good global reconstructions if the sample points are (approximately) uni¬ 
formly distributed in the set S. Moreover, it can be computed efficiently in time O(nlogn). See 
(Fdelsbrunner, 2010) for a survey. 

Cuevas et al. (2012) estimate the perimeter of S by the outer Minskowski content of the r- 
convex hull of the sample, defined as the smallest r-convex set that contains the sample. Since the 
boundary of that set is smooth except at a finite number of points, the outer Minskowski coincides 
with the perimeter. See (Ambrosio et ah, 2008) for a broader correspondence between these two 
quantities. Cuevas et al. (2012) show that this estimator is consistent, but no convergence rate is 
provided. Note that, for large sample sizes, both estimators are quite similar; see Proposition 2 
for a formal statement. From the computational point of view, the a-shape of the sample tends to 
be more stable with respect to the value of a, and is faster to compute over a range of values of 
a—the latter can be done in 0(n log n) time, since the a-shape changes a finite number of times 
with a. The a-convex hull of the sample does not enjoy such properties. 





Figure 1: The a-shape of a sample of size n = 500 from the uniform distribution of a thick S letter, 
for a = 1 (left), a = 0.06 (center) and a = 0.035 (right). Note that in the second case the a-shape 
is made of two disconnected closed curves. 


1.4 Main results 

Let A denote the one-dimensional Hausdorff measure in M^, normalized so that it equals 1 for a 
line segment of length 1, and let diam(A) = sup{||a: — y\\ ■ x,y & A} denote the diameter of a set 
A C 

Theorem 1. Let Xn = (Aii,... ,Xn) be an independent sample from the uniform distribution on a 
compact set 5 C such that S and satisfy the r-rolling condition. Fix a G (0,r). There is a 
constant A depending only on (a, r, diam(5)) and to > 0 depending only on (a,r) such that, for all 
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0<t<to 


( 1 ) 


P 


A(a) 

X{dS) 




< An^ exp{—nt^^‘^/A). 


Remark 1. In particular, defining = (3j41og(n)/n)^/^, with probability one, 


(1 - en)X{dS) < XiCa^iXn)) < (1 + en)A(aS), 

eventually, by applying the Borel-Cantelli lemma. So, the convergence rate of X{Ca{A!n)) as an 
estimator of X{dS) is, up to a log factor, of order 

Remark 2. We will argue later on that the same result holds also for the perimeter of the a-convex 
hull of the sample, refining, thus, the convergence established in (Cuevas et ah, 2012). See the 
discussion in Section 6. 


1.5 Content 

The remaining of the paper is largely devoted to proving Theorem 1. In Section 2 we establish 
some auxiliary geometrical results. Section 3 is dedicated to the study of a-edges. Theorem 1 is 
proved in Section 4. Some numerical experiments are presented in Section 5. We discuss some 
extensions and open problems in Section 6. 

1.6 Notation and preliminaries 

We start by introducing some notation and some general concepts. Let ^i{A) denote the Lebesgue 
measure of a measurable set A C M^. For a pair of distinct points xi,X 2 G let (X 1 X 2 ) denote 
the line passing through xi and X 2 , and let [X 1 X 2 ] denote the line segment with endpoints xi and 
X 2 - For a non empty set d C and e > 0, define 

B{A,e) = {x G : dist(x,yl) < e}. 

If A = {x} is a singleton we use the notation B{x, e) (resp. B{x, e)) instead of B{{x},e) for denoting 
the open (resp. closed) ball of radius e > 0 and center x G M^. Let Pa denote the metric projection 
onto a set A, i.e., Pa{x) = argmin^g^ ||x — a||, which is a singleton when dist(x,d) < p{A). For 
two nonempty sets C, L> C let ^{C^D) denote their Hausdorff distance, defined as 

PiC, D) = inf{e > 0 ; C C B{D, e) and D C B{C, e)}. 

For a curve C C and x ^ C, Cx denotes the tangent subspace of C at x when it exists. For two 
curves, C and D, respectively differentiable almost everywhere and differentiable, and such that 
p{D) > r and C C B{D,r), define the deviation angle of C with respect to D as 

A{C,D) = sup z(Cx,Dp^^x)) , 

x^C 

where Z{Cx, Dpjy(^x:)) G [0,7r/2] denotes the angle between the tangent spaces of C and D at x and 
Pp){x), respectively (Morvan, 2008). Note that it is not symmetric in C and D. 

Where they appear, a and r are fixed. Everywhere in the proof, a constant only depends (at 
most) on a, r and the diameter of S. We will leave this dependence implicit most of the time. 

We let n denote the sample size throughout. We say that an event holds with high probability 
if it happens with probability at least 1 — Ae~'^l^ for some constant A > 0. 
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2 Some geometrical results 

In this section we gather a few geometrical results that we will use later on in the paper. 

Lemma 1. Let 5* C such that S and satisfy the r-rolling condition. Any hall of radius a > 0 
with center in S contains a ball of radius ^ minjo:, r} included in S. 

Proof. Let P be a shorthand for dS. First, we will analyze the case a < r. If z G S' satisfies 
dist(z,r) > a, then B{z,a) C S. Now, take z G S such that dist( 2 ;,r) < a and let y be the metric 
projection of z onto P, which is well-defined since dist( 2 :, P) < /9(P). By the r-rolling property, there 
is an open ball B of radius r tangent to P at y that contains z and B C S. Therefore B{z, a) H B 
contains the ball of radius a/2 tangent to P at y that contains z. See Figure 2 for an illustration. 
This concludes the proof for a < r. If a > r, the ball of radius a contains the ball of radius r with 
same center. By what we just did, that ball contains a ball of radius r/2 which belongs to S. □ 



Figure 2: Illustrates the proof of Lemma 1. The thick, parabolic line represents a portion of F = dS. 

Recall that /r denotes the Lebesgue measure on M^. 

Lemma 2. Let S' C be measurable and such that S and S'^ satisfy the r-rolling condition. For 
any a < r, there is a numeric constant A > 0 depending only on a such that, for any z ^ S, 

y.{B{z, a) n S) > ^ max(0, a — dist(z, 9S))^/^. 

Proof. Let F be a shorthand for dS. It suffices to consider z ^ S such that h = a — dist(z, P) > 0. 
Let y be the metric projection of z onto P, which is well-defined since dist(z, P) < a < p(P), and 
let B be the open ball of radius a tangent to P at y and contained within S. It is clear that 
fj.{B{z,a) n S) > n{B{z,a) n B). The intersection B{z,a) (1 B is the union of two spherical caps 
symmetric with respect to line joining the two points at the intersection dB{z, a)r\dB. See Figure 3 
for an illustration. If C denotes one of them, we therefore have p{B{z,a) D B) = 2y(C'), with C a 
spherical cap of radius a and height h. Its area is equal to 

^acos(l—/ i/q) 

p{C) = 2a^ / sin^(t)df. 

Jo 

Using the bound sin(f) > 2t/TT, valid for t G [0,7r/2], and the bound acos(l — t) > valid for 
t G [0,1], we obtain 2y(C) > AhJ/"^ with A = 32\/2a/(37r^). □ 
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Figure 3: Illustrates the proof of Lemma 2. The thick, parabolic line represents a portion of F = dS. 
The intersection of the two balls is the region of interest. 


For the following result, we use some heavy machinery from the seminal work of Federer (1959). 
For a set T C M^, let £{T) denote its Euler-Poincare characteristic, and recall that A(T) denotes 
its length. 

Lemma 3. Suppose S (Z M? is compact, with both S and satisfying the r-rolling condition. 
There are constants Aq,Ai > 0 depending only on r and diam(5) such that |T(55)| < o.nd 
KdS) < ^1. 

Proof. Let F = dS and d = diam(S), and assume, without loss of generality, that S C B{0,d). 
For a given T such that p{T) > r, let denote the kth curvature measure associated with T, 
k G {0,1,2}, as defined in (Federer, 1959, Def. 5.7). In (Federer, 1959, Rem. 5.10) we find that 

sup{|$fc|(F) : T C B{0,d), p{T) > rj < oo, (2) 

where |<I>fc|(T) is the total variation of over T. Now, by (Federer, 1959, Rem. 6.14), <hi(r) 
coincides with the one-dimensional Hausdorff measure, so that |4>i|(r) = 4>i(F) = A(r). From this, 
we deduce the existence of Ai. By (Federer, 1959, Th. 5.19), <ho(F) coincides with T(r) and, by 
(2) for A; = 0, we get that there is some constant Aq such that |<I*o(r)| < |$ol(r)<Aio. □ 

We define an e-net of a set S as any subset of points xi,... Xm £ S such that \\xj — Xk\\ > e 
when j 7 ^ k, and that, for any x £ S, ||x — Xj\\ < e for some j = 1,...,m. Note that any bounded 
set S' C admits an e-net of finite cardinality. 

Lemma 4. For any bounded S C there is a constant A depending only on diam(S) such that, 
for any 0 < e < diam(S), any e-net for S has cardinality bounded by Ae~‘^. If, in addition, both S 
and S'^ satisfy the r-rolling condition, then there is a constant A! depending only on r and diam(S) 
such that any e-net for dS has cardinality bounded by A'£~^. 

Proof. Assume without loss of generality that S C R(0, d) where d = diam(S). Let xi,..., Xm be 
an e-net of S. Since B{xj,e/2) n B{xk,e/2) = 0 when j / k, we have 

m 

vrd^ > /i(R(0, d) n B{xj,£/2)) > rn'KiejTf', 

i=i 

using Lemma 1 in the last inequality. We therefore have m < 16d^/e^. This proves the first part. 


6 




For the second part, let F = dS. It is enough to show the results for e < 2r. Note that 2r < d 
by the r-rolling condition on S. Let yi,, y^' be an e-net of F. Since B{yj,£/2) n B{yk, e/2) =0 
when j ^ k, we have 

mV (j)" = f. {», D) < M (B (r, i)) . (3) 

By (Federer, 1959, Th. 5.6), we have 

MB(F,e/2))=e$i(F) + ^e2cI>o(F), 

where <hi(F) = A(F) (Federer, 1959, Rem. 6.14) and 4’o(r) is the Euler-Poincare characteristic of 
F (Federer, 1959, Th. 5.19). By Lemma 3, there are positive constants Ao,Ai depending only on r 
and d such that A(F) < Ai and |4>o(F)| < yielding 

e/2)) < ^le -|- Aq—e"^ < A 2 £, 

where A 2 = Ai -|-j4o(7r/4)d, using the fact that e < d. Plugging this into (3), we conclude the proof 
of the second part. □ 

Next, we establish some basic properties of a line segment joining two points on a circle which 
barely intersects a set with smooth boundary. 

Lemma 5. Let S C be such that both S and satisfy the r-rolling condition. Fix a E (0, r) 
and 0 < t < minjo:, 2a^/r}. There is a constant A > 0 depending only on {r,a) such that, for any 
z ^ S with 0 < a — dist( 2 ;, S) < t/A and any xi,X 2 E dB{z, a) n S, we have 

[X 1 X 2 ] C B{dS,t), (4) 

||a:i-X2|| < Vi, (5) 

A{[xiX2],dS) < Vi. ( 6 ) 

(The angle in (6) is well defined because of (4) and the bound t < a < r.) 

Proof. Let F be a shorthand for dS. Define 5 = a — dist( 2 ,S'), and let 61,62 denote the canonical 
basis vectors of Since p = dist( 2 ,F) = dist(^,5') = a — 6<r, y = Pr{z) is well-defined. 
Without loss of generality, assume that y is the origin and that the tangent of F at y is the line 
spanned by 61 . Note that the line {yz) is perpendicular to the tangent at y, so that 2 ; is on the 
line defined by 62 and without loss of generality we assume 2 ; = —pe 2 . Let i? be a shorthand for 
B(z,a) and let B^ (resp. B~) be the open ball centered at re 2 (resp. —re 2 ) with radius r. Since 
S and satisfy the r-rolling condition, R+ C S and B~ C S’^. Let x* = 5e2. By construction x* 
belongs to (yz) n dB n B~^. See Figure 2 for an illustration. 

For any point x G B (1 S, 


dist(x,F) = dist(x,S''^) < dist(x,R ) < dist(x*,R ) = 5- 

Direct calculations show that dB n dB~ is given by the points ±a 6 i — 662 , where 

J -|- (r — 6)^ = r^, 

\ -|- (p — 5)^ = o?. 

So, using the fact that p = a — 6, we have 

0 < 5 = ~P^ = (g -p)(a -Fp) ^ aS 

2{r — p) 2(r — p) ~r — a 


(7) 
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Figure 4: Illustrates the proof of Lemma 5. The thick, parabolic line represents a portion of F = dS. 


To prove (4), take x G [xiX 2 ]- If x G S', then x G -B n S and we saw that dist(x,r) < 5. If 
X ^ S, let C be the closure of the intersection of B with the half-plane above the line Mei — he 2 - 
Since B n C B~ and B~ n S = 0, necessarily xi, X2 G C, which in turn implies that [X1X2] C C 
since C is convex. In particular, x G C, so that dist(x, [—aei, aei]) < max{6, <5}. And since 
dist([—aei, aei], B'*') < b (by symmetry), we conclude with the triangle inequality that 

dist(x,r) = dist(x,S) < dist(x,B“'") < 2max{6, J} < Ai5, (8) 

for Ai = 2max{a/(r — a), 1}. This is valid for any x G [X 1 X 2 ], and proves (4) for any A > Ai. 

To prove (5), we use the fact that xi, X 2 G B n S C B \ B“, so that ||xi — X 2 II < diam(B \ B~), 
and diam(B \ B~) = 2a when b < p, which is the case since our assumptions that 6 < t/A and 
t < 20? jr imply b < {r — a)alr, which forces b <phy (7). Continuing, we then have 

= r‘^ — {r — 6)^ = b{2r — b)< 2br < Air5, 


by (8). From this we get 

ll^^i — X 2 II < diam(B \ B~) = 2a < ■\/~A^, (9) 

where A 2 = 4Air. This proves (5) for any A > max{Ai, A 2 }. 

We turn to proving (6). We first note that Z([xiX 2 ],r) is well-defined. Indeed, by assumption 
5 < t/A, with A > Ai > 1, and t < a, so that B([xiX 2 ],r) < a by (4), and we conclude with the 
fact that / 9 (F) > r > a. For any x G [X 1 X 2 ] we can therefore compute the point y' = Pr{x). Using 
the triangle inequality for angles, we have 

Z([xiX2],fj//) < Z([xiX2],fj^) -b /-{fy,fy/) = 01+ 02. (10) 


We first bound 0i. Direct trigonometric calculations show that 


sm(0i) < — < 
a 

8 




2a 








where the last inequality comes from (9). We use the fact that sin(0) > 20 / 7 r for all 0 E [ 0 , 7 r/ 2 ], 
we get 6i < A^V6, where = 'k^J~K^/(A:Ol). It remains to bound O 2 in (10). We have y = Py{x*) 
and y' = Pr{x), and dist(x*,r) = 6 < a hy construction, and also dist(a:,r) < t < a because of 
(4). Hence, by (Federer, 1959, Th. 4.8(8)), we get 

\\y - y'W < —^\\x - x*||. 

r — a 

Using the fact that x,x* G B\ B~, and then (9), we have ||a: — x*|| < ^/A^. Now, if we denote by 
ffy and ffy/ the outward pointing unit normal vector of F at y and y' respectively, (Walther, 1997, 
Th. 1) ensures that 

1 

Wvy-yy'W < -||y-y'||. 

Since {fiy,fjy') = (Fy,Fy/) = cos 02 , we get 

WVy - Vy'W = \/2 - 2 cos 02 = 2 sin( 02 / 2 ). 

We arrive at 

sin( 02 / 2 ) < 

2(r — a) 

As before, this implies that 02 < A^y/d, where A 4 = 7ry^A2/{4{r — a)). We conclude that 

Z([xiX 2 ],fy) < (A 3 + A4)V6 = y/As^, 

which proves ( 6 ) for any A > max{Ai, A 2 , A 5 }. □ 


The following is a technical result involving two line segments, one on each of two intersecting 
circles of same radius, and a line passing through these line segments. 

Lemma 6. Letxo,XQ E such thatO < ||xo —Xq|| < 2a, and let xi,X 2 E dB{xo,a)\B{xQ,a) and 
x'i,X 2 E dB{xQ,a) \ B{xo,a). Let L be any line interseeting both [X 1 X 2 ] and Then there is 

a constant A > 0 depending only on a sueh that 


max 


|z((xiX2),L), Z((x4X2),L)| 


< A 


xo — Xnll + max 



Proof. Let B and B' be a shorthand for B{xo,a) and B{xQ,a), respectively. Since the maximum 
above is bounded by 7r/2, it is enough to prove the inequality when 

a = llxn — Xnll + max llxi — x'A\ < a. 

Let T = (xqXq), and let H and H denote the two half-spaces defined by T. Let t denote the 
intersection point {dB \ B') n T, and define t' analogously. Let m denote the intersection point 
dB n dB' n P[, and define m analogously. See Figure 5 for an illustration. 

We claim that, when a < a, the points xi, X 2 , x'^, X 2 are either all in H or all in H. Indeed, when 
Xj and x' are on opposite sides of T, then either Xj E arc(mf) and x' E arc(mT), or Xj E arc(mt) 
and x' E arc(mt'). (For two points s,t G dB, arc(sf) denotes the shorter arc defined on dB by s 
and t.) The distance between a point in arc(mf) and a point in arc(mf') is not smaller than the 
minimum of ||t — m|| > V2a and \\m — m\\ > y/Sa, since 0 < ||xo — Xq|| < a. Therefore, assume 
without loss of generality that xi,X 2 ,X 4 ,X 2 E H. 

Let y be the point in H n dB furthest from T, so the tangent of dB at y is parallel to T. Define 
y' similarly, with B' in place of B. We claim that xi,X 2 E B{y,y/2a) and x'i,x '2 G B{y',y/2a). We 
prove this for xi, without loss of generality, and consider the two possible cases: 
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Figure 5: Illustrating the proof of Lemma 6. 


• If xi G arc(?/m), then 

I|y - a^ill < \\y - m\\ < \\y - y'\\ = ||xo - Xq\\ < a. 


• If xi G avc{ty), let us define h = ||xi — y\\, d = dist(xi, (yxo)) and z = P(^yxg){xi). By the 
Pythagoras theorem, 

d^ + \\y-zf =h\ 

d^ + {a- \\y - z\\ f = c?. 

From this we get = h‘^{l — h‘^/ (4a^)) > /i^/2, where the inequality is due to s < dist(t, y) = 
\/2a. But d < maxj jg|i_ 2 } \\xi — x'|| < a. Hence, h < V2a, as claimed. 

By the fact that B is convex, the angle between {X 1 X 2 ) and T is bounded from above by the 
maximum angle between T and the tangent of dB at any point in arc(xiX 2 ). Moreover, by direct 
calculations, similar to that on Lemma 5, for any point on x G dB such that \\y — x|| < V2a, the 
angle between T and the tangent of dB at x is bounded by 2 asin(||y — x||/(2a)) < 7r||y — x||/(2q;). 
Hence, by the fact that xi,X 2 G B{y,y/2a) C B{y,y/2a), we have 

Z((xiX 2 ),r) < — max{ y - xi , \\y - X 2 } < —v 2 a = 

2a 2a V 2 a 


Similarly, 

z((x;x'2),r)<^ 

V2a 


By an analogous convexity argument, coupled with the fact that all the action is in half-space 
H, /.{L,T) is bounded from above by the maximum of any angle between T and a tangent of dB 
at any point in arc(xiX 2 ), or any angle between T and a tangent of dB' at any point in arc(x'^X 2 ). 
Hence, as before, we get 


^{L,T) < 


vra 

\/2a 


All the bounds combined, together with the triangle inequality, yield 


Z((xiX2), L) < Z((xiX2), T) Z(T, L) < —^ 

V2a 


and similarly for (x'^x^)- 


□ 
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The following result is useful when comparing the length of two curves in terms of their Hausdorff 
distance and their deviation angle. 


Lemma 7 (Th. 43 in (Morvan, 2008)). Let T be a compact curve in such that /9(r) > r and 
let C be another curve in differentiable almost everywhere, such that C C B{V,r) and Py is 
one-to-one on C. Then 

cosZ(c,r) ^ A(r) ^ 1 

l + \-H{C,T) - \{C) - 


Proof. The result is an immediate consequence of (Morvan, 2008, Th. 43) and the fact that the 
reach bounds the radius of curvature from above (Federer, 1959, Lem. 4.17). □ 


3 Some properties of ct-edges 

Our standing assumption in this section is the following: 

(*) The data points Xn = {Xi ,..., Xn} are independently sampled from a uniformly distribution 
with compact support S cM? such that both S and satisfy the r-rolling condition. 

For any pair of distinct data points within distance 2a from each other, there are only two 
circles of radius a passing through them, symmetric with respect to the line joining the two points. 
In the special case of an a-edge, at least one of the two circles is empty of data points inside. The 
following result implies that, with probability tending to one, the center of such a circle lies outside 
of S. 


Proposition 1. Assume (*). For any a > 0, there is a constant ^ > 0 depending only on 
(a, r, diam(S')) such that, with probability at least 1 — Ae~'^l^, there are no open balls of radius a 
with center in S empty of data points. 

Proof. Let d = diam(S') and assume without loss of generality that S C B{0,d). We will focus on 
the case a < r. The case a > r can be analyzed similarly. By Lemma 1, if there is a ball of radius 
a with center in S empty of data points, then there is a ball of radius a/2 included within S that 
is empty of data points. By Lemma 4, there is an (a/5)-net of S, denoted zi,... ,Zm, satisfying 
m < Ai, where Ai depends only on d and a. By the triangle inequality any ball of radius a/2 
included within S contains a ball of the form B{zk, a/5). Hence, 


F{3z£S : XnnB{z,a) = 0) 


< P {3k = 1,..., m : n B{zk, a/5) 

m 

< P (T’n n B{zk, a/5) = 0) 

k=l 


E 

k=l 


1 - 


fj.{B{zk,a/5)) 

KS) 


< Hi [1 - (a/(5d))2] 


21 ^ 


0 ) 


where in the second inequality we used the union bound and in the third we used the fact that 
m < Ai and S C B{0,d). Therefore the result holds with A = max{Hi, —1/log[l — (a/(5d))^]}. □ 

Remark 3. We say that a data point is a-isolated if there are no other data points within distance 
2a from it. Suppose that Xi is a-isolated so that B{Xi,2a) n T/j = {Xi}. By the r-convexity 
of S'^, there is an open ball B G S with radius a such that Xi G B, which in particular satisfies 
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B C B(Xi, 2a) n S. Let B' C B he an open ball of radius a/2 such that Xi ^ B'. By construction, 
B' is included within S and is empty of data points. We conclude by Proposition 1 that, under (*), 
with high probability, there are no a-isolated data points. 

Proposition 2. Take a > 0 and finite set of points C such that there are no a-isolated 
points. Then the vertices of the a-shape of X and the vertices of the a-convex hull of X coincide. 

Proof. Let C and H denote the a-shape of X and the a-convex hull of X, respectively. Note in 
particular that H = PlseB where B is the set of open balls of radius a that do not intersect X. 
First, take x & X such that x € dH. By (Cuevas et ah, 2012, Prop. 2), there is a open ball B of 
radius a such that x E dB but BOX = 0. Let B pivot on x. Since x is not a-isolated, the ball will 
eventually hit another data point, denoted x'. Then x and x' belong to the boundary of an open 
ball B' of radius a that does not contain any other data point by construction—for otherwise the 
ball would have hit that another data point before x' —so \xx'\ forms an a-edge. This implies that 
X is a vertex of C. By definition of H above, B' C H^. Therefore x ^ B' C and since x G H, 
we have x G H Ci = dH. □ 

The next proposition bounds the expected number of a-edges. 

Proposition 3. Assume (*). For any a G (0,r), there is a constant A > 0 depending only on 
(a, r, diam(S')) such that the expected number of a-edges is hounded by Anfl"^. 

Proof. Let and denote the number of vertices of the a-shape and a-convex hull, 

respectively, and let F denote the event that there are no a-isolated points. By Proposition 2, 
^^hape _ jyhuii gn F, SO that -|- nlF'^, and consequently 

]g(Arshape) < E(ArMi) + nP(F^). 

On the one hand, P(F'^) = 1 — P(F) < A\e~'^l^'^ for some constant Ai, by Proposition 1 and 
Remark 3. On the other hand, by (Pateiro-Lopez and Rodn'guez-Casal, 2013, Th. 3), < 

A 2 n^^^, for some constant A 2 . From this, we conclude. □ 

Remark 4. For i < j, let Gij be the event that [XiXj] forms an a-edge. By the fact that the points 
are iid, P(Gjj) is independent of i < j. Hence, the expected number of a-edges ( 2 ) P(Gij) and 
Proposition 3 implies that P(Gij) < An~^^^ for some constant A. 

The next result ensures that, with high probability, for each connected component of dS there 
is at least one a-edge within distance a. 

Proposition 4. Assume (★). For any a G (0,r), there is a constant A > 0 depending only on 
(a, r, diam(S)) such that, with probability at least 1 — Ae~^/^, for any connected component of dS, 
there is an a-edge with an endpoint within distance a of that component. 

Proof. Suppose that all the open balls of radius a/2 centered at a point in S intersect the sample. 
By Proposition 1 this happens with probability at least 1 — Ae~^^^ for some constant ^ > 0. We 
saw in Remark 3 that this implies that there are no a-isolated data points. Let F^ be a connected 
component of F = dS. Fix y GTj. and let rj denote the normal unit vector of F^ at y pointing away 
from S. For s > 0, define ys = U + sy and let s* = inf{s > 0 : B{ys, a) (1 Xn = 0}. Notice that 
B{ya,a) C and, therefore, it is empty of data points. Hence, s* < a. Moreover, we also have 
s* > 0, since we are assuming that B(yo, a/2) contains at least one data point (since yo = y G S). 
By construction, there exists a data point Xi G dB{ys*,a). Now, pivot the ball B{ys*,a) on Xi 
as we did in the proof of Proposition 2. Since Xi is not a-isolated, the ball will eventually hit 
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another data point, denoted Xj, and [XiXj] will form an a-edge. And, since \\Xi — ys*|| = a 
and ys* G S'^ (remember 0 < s* < a), there is 2 ; G [Xiyg*] such that z G F. We now use the 
fact that B{ys*,a) n F is contractible (Federer, 1959, Rem. 4.15), and since B{y*,a) n F^ 7 ^ 0, 
we must have B{ys*,a) H F = B{ys*,a) H F*,, which in turn implies that z G Ffc and, therefore, 
dist(Aj, Ffc) < a. □ 

Next, we prove some quantitative results about a-edges. In plain English, we show that, with 
probability tending to one, a-edges are near the boundary of S, have small length and their deviation 
angle with the boundary of S is small. 

Proposition 5. Assume (*). Fori < j, let Gij denote the event that [AjXj] is an a-edge, and for 
t > 0, let Hij^t denote the event that 

[XiXj] c B{dS,t), \\Xi - XjW < Vi and Z{[XiXj],dS) < Vi. (11) 

For any a G (0,r), there is a constant A > 0 depending only on (a, r, diam(5)) such that, for any 
0 < t < min{a, 2a^/r}, ¥{Gij n HfjV < 

Proof. Let F be a shorthand for dS. For any two distinct points x, x' G such that ||x — a:'|| < 2a, 
define 

f x' — X 

(^{x,x) = x + a r.±e tt-. - 

\ ||x — X 

where 9 = acos(||x — x'||/(2a)) and denotes the rotation at angle 9. By construction, x,x' G 
dBV^{x,x'),a), and C^(x,x') are the only two points with this property. Let be short for 
C^{Xi,Xj), if IIAj - XjW < 2a, and (Ci^jXij) = otherwise. 

Let E be the event that there are no open balls of radius a with center in S empty of data 
points. We studied this event in Proposition 1. With Ai denoting the constant of Lemma 5, we 
have 



H'ij ^ n Gij n E C ^3e e {-,+} : Xn n B{Cfj,a) = 0, Cfj ^ s anddist(Cf 7 , 5") < a - t/Ai|. 
Therefore, the union bound gives 

F{HV, n Gij n E) < ^ P (A„ n B{C!j, a) = 0, C!j i S anddist(C^., 5) < a - t/Ai) . 

e=^ 

With A 2 denoting the constant of Lemma 2, for any deterministic point C, ^ S such that dist(C, S) < 
a — t/Ai, we have 

pi{Sf^B{C,a))y-^ 

h^iS) ) 

A2t3/2 
V AfVd 2 



P(A„_2nR(C,a) = 0) = 


for some constant A 3 which depends only on a, r and d ;= diam(S). Hence, conditioning on 
{Xi,Xj), we have 

P {Xn n B{C!j,a) = 0, Cfj i S anddist(C^-, 5) < a - tjAf) < 
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Together with Proposition 1, we arrive at 

P(F^., n Gij) < P(F^., n Gij nE)+ ¥{E^) < 

for some constant yl 4 , again depending only on {a,r,d). □ 

The next two results combined imply that, with high probability, the a-edges form a simple 
polygon in one-to-one correspondence with dS. The first result shows that, with high probability, 
two distinct points in the union of all a-edges do not project on the same point on dS. We also 
show that a-edges are all one-sided in the sense that at least one of the two open balls of radius a 
that circumscribes an a-edge contains a data point. 

Proposition 6. Assume (*). For any a G (0,r), there is a constant A > 0 depending only on 
(a, r, diam(5)) such that, with probability at least 1 — Ae~'^l^: (i) all a-edges are one-sided; and 
(a) the metric projection onto dS is injective on the union of all a-edges. 

Proof. Let T be a shorthand for dS and d = diam(5). Assume there are no balls of radius a with 
center in S empty of data points and that, for t fixed (and chosen small enough in what follows), 
all the a-edges satisfy (11). Both events happen together with probability at least 1 — Ae~'^l^, for 
some constant A > 0, by Propositions 1 and 5. 

We hrst show that, if t is small enough, all a-edges are one-sided. Let [xia: 2 ] (xi = , X 2 = Aj^) 

be an arbitrary a-edge. Let Xra = {xi -|- be the midpoint of that a-edge and p = {o? — \\xi — 

XTaW^Y^"^- If there is a ball of radius a, B, such that xi,X 2 G dB, then the center of B is either 
Ze = Xm + pu or Zg = Xm — pu, where u is the unit vector orthogonal to (X 1 X 2 ) such that {u, rj) > 0, 
rj being the outward pointing unit normal vector at y^a = Pr{xm)^ which is well-defined when t < r. 
Notice that the vector u is well defined when ^/i < 7r/2, since in that case (^ 1 X 2 ) is not orthogonal 
to P. We will prove that, for t even smaller, Zg G S and therefore B{zg,a) is not empty of sample 
points. Define Cg = ym — PP and c = ym — rp. By the r-rolling property, B{c,r) C S. By the 
triangle inequality and (11), we have 

Iks - Csll < \\Xm - y-mW + p\\u - p\\ <t + a\\u - p\\, 


with, for t small enough. 


Ik “ pW'^ = 2(1 - {u, p)) < 2(1 - cos Z([xiX 2 ], P)) < 2t, 

using (11) (i.e., Z([xiX 2 ],r) < Vi) and the fact that cos(a) > 1 — for any o G M. Using the 
triangle inequality and (11), again, we get 


Iks — c|| < \\zg — Csll -|- Iks — c|| <t-\- aV^ -\- r — y a"^ — (vk)^ < r, 

for t small enough, in which case Zg G B{c,r) C S. 

Now we prove that the metric projection onto P is injective on the union of all a-edges. Indeed, 
assume that this is not the case, so there are two distinct points belonging to some (necessarily 
distinct) a-edges, x G [Aj^Ajj] and x' G [Aj'^Aj^, with the same metric projection onto P, denoted 
y = Pr(x) = Prix'). Let p be the outward pointing unit normal vector at y. For short, let 
xi = Aj^, X 2 = Ajj, x[ = Aj/^, X 2 = Aj^. By the triangle inequality and the fact that |k — x'|| < 
dist(x,r) -|-dist(x', P), and then (11), we have 

max \\xi — x(-|| < Ik — x'W -\- |ki — X 2 II -|- |ki “ x'^W <2t -\- 2Vi < 3vk, (12) 

ij6{l,2} ^ 
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when t is small enough. Also by the triangle inequality for angles and (11), 

Z((xiX2), {x'ix' 2 )) < Z((xiX2), fj/) + Z(fy, (xix2)) < 2Vi. 

Let B and B' denote the open balls of radius a circumscribing [X 1 X 2 ] and [x'^x^], respectively, and 
empty of data points. Since all a-edges are one-sided, these balls are uniquely defined. Also, define 
Zg and z'g analogously to Ze and Zg above, but based on x'^ and X 2 , instead of xi and X 2 . Using the 
same notation as above, we have B = B{ze, a) and B' = B{z'^, a) and 

Ike - -Zell < Ikm “ ^'mW + \\P'^ “ P'u'\\- (14) 

Reasoning as in (12) above, we have ||xm — x'^\\ < “iy/t. Also, 

\\pu- p'u'f = P^ + {p'f - 2pp'{u,u'). 

Using (11), p^ = — ||xi — x^lP > — t and, similarly, (p')^ > — t. Moreover, by (13) and 

using again the inequality cos(a) > 1 — for any a G M, we get {u, u') > I — 4t. Hence, 

\\pu - p'u'f < 2a^ - 2{a^ - t){l - 4t) < ( 8 a^ + 2)t. 

Hence, the bound in (14) leads to |ke — z'll < 3y/t+ ( 8 a^ + 2Pfl‘^\/t = Aiy/t when t is small enough, 
where Ai is a constant. Combining this bound with that in (13), and applying Lemma 6 , we obtain 
that 

max{Z((xx'), (X 1 X 2 )), Z((xx'), ( 2 : 1 X 2 ))} < A 2 vk, 

where A 2 is a constant. By the fact that (xx') is parallel to p (Federer, 1959, Th. 4.18(12)) and 
using ( 11 ), we also have 

max{Z((xx'), (X 1 X 2 )), Z((xx'), ( 2 :'iX 2 ))} >^-Vi. 

We therefore have a contradiction when t is small enough that all the derivations above apply and, 
in addition, y/i < Trj{ 2 A 2 + 2 ). □ 

Remark 5. Any one-sided a-edge shares each one of its endpoints with another a-edge. Indeed, 
suppose [X 1 X 2 ] is an a-edge, so that there exists C such that xi, X 2 G dB{C, a) and A’„nil(C, a) = 0 - 
In that case, let H(C, a) pivot on X 2 , as we did in the proof of Proposition 4 away from xi. Let X 3 
denote the first data point that the ball hits. Then [X 2 X 3 ] is an a-edge by construction. If X 2 is not 
shared with any other a-edge, then the ball pivots on X 2 away from xi until it touches xi from the 
other side. That (open) ball is empty of data points inside, and together with the ball we started 
with, makes [X 1 X 2 ] two-sided. 

Proposition 7. Assume (★). For any a G (0,r), there is a constant A > 0 depending only on 
(a, r, diam(5)) such that, with probability at least 1 — Ae~^^^, the union of all a-edges is in one- 
to-one correspondence with dS via the metric projection onto dS. 

Proof. Let T be a shorthand for dS and d = diam(S), and let Cq, denote the union of all a- 
edges. Since T is a (compact) one dimensional manifold (Walther, 1999), it is well-known that each 
connected component of T is a closed curve homeomorphic to the unit circle, see (Lee, 2011, Thm. 
5.27). We prove that this is also the case for each connected component of Cq. We assume that 
the metric projection onto T, meaning Pp, is injective on Cq, that all a-edges are one-sided, that 
Cq C B{T,a )—so that Pp is well-defined on Cq— and that Cq n B{Tk,a) / 0 for any connected 
component T^ of T. This event happens with probability at least 1 — Ae~'^l^ for some constant 
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A > 0, by Propositions 5, 4 and 6. We prove that, under these circumstances, Cq is in one-to-one 
correspondence with P via Pr- Indeed, let P^ be a connected component of P. Let [ 0 : 1 X 2 ] be an 
a-edge such that [X 1 X 2 ] H i?(rfc,a) / 0. By assumption, there is a data point X 3 such that [X 2 X 3 ] 
is also an a-edge. Having constructed [xa-iXa], let x^+i be a data point such that [xaXa+i] is an 
a-edge. Since Ca C B{T,a) = UiB{T —where the union is of disjoint sets by (Federer, 1959, 
Rem. 4.15, (1))— and the polygon Ua[xaXa+i] is connected, necessarily, Ua[xaXa+i] C P(rfc,a). 
Also, since the sequence (xq : a > 1) is made of finitely many data points, and Xa / Xq+i for 
all a, there is a,h > 1 such that Xa = x^+b+i, and we further may assume that Xa, ■ ■ ■ ^Xa+b are 
all distinct. Therefore, by construction, C = [xqXq+i] U • • • U [xa+b-iXa+b] is a simple polygon 
made of a-edges such that C C B{Tk-,ot). In particular, the latter implies that Pr(C) C Pfc, and 
since C is homeomorphic to the unit circle and Pp is continuous and injective on C, Pr{C) is also 
homeomorphic to the unit circle. This forces Pr(C*) = Pfc, due to Pfc being homeomorphic to the 
unit circle too. Since all this is true for any k, meaning any connected component of P, we conclude 
therefore that Pp : Cq, P is not only injective, but also surjective. □ 


4 Proof of Theorem 1 


We are now in a position to prove the main result, meaning, Theorem 1. Let P be a shorthand for 
dS and let Ca denote the union of all a-edges. 

By Proposition 5 together with the union bound, and then Proposition 7, for any 0 < t < 
min{a, 2a^/r}, with probability at least 1 — ^ for some constant Ai > 0 depending 

only on (a, r, diam(5)), Ca is in one-to-one correspondence with P via the metric projection onto 
P, and satisfies Ca C P(P,t) and A{Ca,^) < Vi- Note that, because Ca and P are in one-to-one 
correspondence, Ca C P(P,t) implies that P C B{Ca,t), so that P(C'a,r) < t. We now apply 
Lemma 7, combined with the simple bounds cos a > 1 — V/2, for a > 0, and (1 — a)~^ < 1 + 2a, 
valid when 0 < a < 1/2. Assuming t <1, this yields 


and 


We get 


W - ^ ^ ^ + «)<! + (!+ 2/r)t 


^>i-ln{Ca,r)>i-t/r. 


VCa) 


A(P) 


-1 


< (1 + 2/r)t. 


Hence, if t < to := min{a/2, 20? jr^ 1}, we have 

KCa) 


A(P) 


- 1 


> (1 -|-2/r)t^ < Ain^ exp(—nt^/^/Ai). 


Then a change of variable concludes the proof of Theorem 1. 


5 Numerical experiments 

In order to numerically check the conclusions of Theorem 1 we performed a small simulation study. 
For the set S we chose the corona {x E : 0.25 < ||x|| < 1}. In this case the value of r 
is equal to 0.25 (the radius of the hole) and \{dS) = 27r(0.25 -|- 1). The selected sample sizes 
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were n = 1000,5000,10000,30000,40000,50000. For each sample size n, we simulated M = 1000 
samples from the uniform distribution on S and calculated the a-shape for each sample. The 
values of a were 0.05,0.1,0.15,0.2,0.24, and the limit case a = r = 0.25. Given n, a, and sample 
mE{l,...,M}, we computed the sample a-shape, denoted Ca’'^, using the R-package alphahull 
of Pateiro-Lopez and Rodnguez-Casal (2010), and then its perimeter A(Ca’™'). We estimated the 
expected error and bias by 

M ^ 

^c.{n) = -^\X{C2n-HdS)\ and 6„(n) = - A(C’”^) - A(95), 

m=l m=l 

respectively. Let Sa{n) denote the sample standard deviation of {X{Ca’^),m = 1,... ,M}. 

• Among the a’s that we tried, the estimator performs best at a = 0.2. It does not seem that, 
asymptotically, the best a converges to r. For instance, the ratio eQ, 24 {n)/eo, 2 in) is around 
6.7 for n > 30000. 

• Figure 6 shows the error versus sample size in log-log scale for a = 0.1,0.2,0.24,0.25. It can 
be seen that the error corresponding to a = r does no go to zero whereas a = 0.2 always 
outperform the other considered values of a. The trend for large values of n is clearly linear 
and the slope is close to —2/3 as Theorem 1 predicts. This is particularly true when a = 0.2 
(our best choice), where fitting a line by least squares yields a slope of —0.67, with (Student) 
95%-confidence interval of (—0.73, —0.62), and an R-squared exceeding 0.99. 

• For the limit case a = r, the bias, bain) does not go to zero as the sample size increases. 
The error Crin) is approximately equal to 0.18; see Figure 6. This shows, from the numerical 
point of view, that the perimeter of the a-shape is not a consistent estimator of the X{dS) 
for a = r. The main problem here is that the length of the a-edges does not go to zero, as 
Proposition 5 states for a < r. 

• The convergence rate of the standard deviation seems to be higher that —2/3. In fact, we have 

reasons to believe that the slope is of order This is confirmed numerically. Indeed, if we 

fit a line to the log-log plot of so, 2 in), we get a slope with (Student) 95%-confidence interval 
of (-0.86,-0.82). So, asymptotically, it seems that the error is dominated by the bias. This 
suggests that reducing the bias of the estimator could lead to improve the convergence rate 
of the method. 

• The random variable A(C'q,) seems to be asymptotically normal. For the greatest considered 
n = 50,000, the sample {X{Ca’^),m = 1 ,...,M} passes the Shapiro-Wilks normality test 
for several values of a. For instance, for a = 0.2, we got a p-value of 0.82. 


6 Discussion 

We discuss a number of extensions and open problems. 

Extensions. Our arguments extend more or less trivially to other sampling distributions. It 
is completely straightforward to see that Theorem 1 applies verbatim to a sampling distribution 
which has a density with respect to the uniform distribution which is bounded away from zero near 
the boundary of S. A little less obvious is an extension to the case where this density converges 
to zero at some given rate near the boundary, which ends up impacting the rate of convergence of 
our estimator. In any case, our estimator remains consistent. The same results carry over to the 
case where dS has a finite number of ‘kinks’, i.e., points where the reach is infinite. 
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Figure 6: Plot of error versus sample size, in log-log scale. The error corresponding to a = r = 0.25 
does not converge to zero. For values of a < r, the plots show asymptotic slopes which are all very 
close to —2/3, as Theorem 1 predicts. 


Choice of tuning parameter. The estimator depends on knowledge of r, or at least a lower 
bound on r, since any a G (0, r) fixed appears to yield the convergence rate in Choosing 

a automatically, therefore, requires an estimate on the size of r. This is done in recent work by 
Rodriguez-Casal and Saavedra-Nieves (2014). Suppose we have an estimator such that r/2 < 
< 3r/2 with high probability. We speculate that the convergence bound obtained in Theorem 1 
with a chosen equal to f„/4 remains valid, albeit with a different multiplicative constant. 

Finer asymptotics. Braker and Hsing (1998) were able to compute the exact asymptotic ex¬ 
pected value and variance of the perimeter of the convex hull of a sample, and also to show an 
asymptotic normal limit. An open problem would be to do the same here. Our numerical experi¬ 
ments lead us to speculate that our estimator is also normal in the large-sample limit. 

Minimax rate. We conjecture that the rate that our estimator achieves, i.e., n“^/^polylog(n), 
is not minimax optimal, not even in the exponent. Indeed, we learn in (Korostelev and Tsybakov, 
1993, Chap 8) that for the problem of estimating the area (in the context of binary images), 
an estimator obtained from computing the area of an optimal set estimator (for the symmetric 
difference metric, and the a-convex hull is such an estimator) only achieves the rate while the 

optimal rate is n~^l^ with the assumptions we make here. It is very reasonable to infer that the same 
is true for the more delicate problem of perimeter estimation. In fact, Kim and Korostelev (2000) 
show that is (up to a poly-logarithmic factor) the minimax rate for perimeter estimation of 

a horizon (also in the context of binary images). 

Higher dimensions. Our setting is that of a set S in two dimensions. How about higher 
dimensions? The problem would be to estimate the {d — l)-volume of the boundary of a set 
S C under the same conditions, and the estimator would be the {d — l)-volume of the a- 
shape of Xn, which is the union of all the a-faces. We say that ..., form an a-face if 
they are affine-independent and there is an open ball B of radius a such that Aj^, • • •, Aj^ G dB 
and B n = 0. Most of the auxiliary lemmas and propositions can be extended to the general 
framework. However, we have no idea how to extend Proposition 7. 

The a-convex hull. Our results apply to the a-convex hull of the sample. This is because, with 
high probability, it shares the same vertices as the a-shape (by Proposition 2). When this is the 
case, the former is the union of arcs of radius a with base the a-edges. In particular, if an a-edge 
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is of length then the length of that arc is 2a sin“^(t'/(2a)) = £ + 0{i^). By Proposition 5 and an 
application of the union bound, the largest a-edge is of order Op(log(n)/We conclude that the 
ratio between the perimeters of the a-convex hull and of the a-shape is of order l+Op(log(n)/n)^/^. 
We note, however, that the perimeter of the r-convex hull is consistent while the perimeter of the 
r-shape is not necessarily so. Our results require a < r. 
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